BetterGedcom - Sync tag table

mstransky 2010-12-21T08:40:42-08:00

could this work table for all help?

If this table can be the output format for a BG gedcom like file OR BG xml like file.

Once the table is complete each DEV only needs to create a sync list of it for thier own app.

Also this give the visitors and watchers see what BG is trying to acknowledge as a datafield and where we are with it.

mstransky 2010-12-21T13:01:54-08:00

Good that is great to know the dead is better. Also for explianing how a work around leanght with CONT is implemented.

mstransky 2010-12-21T13:05:37-08:00

"5.5 specifications" I grabed a list of so called GEDCOM tags and even still I can open GED flat files and never see

1 _SCHEMA
2 INDI
3 _FA1
4 LABL Fact 1
3 _FA2
4 LABL Fact 2
3 _FA3
4 LABL Fact 3

OR.......

4 LABL Fact 13
3 _MREL
4 LABL Relationship to Mother
3 _FREL
4 LABL Relationship to Father
2 FAM
3 _FA1
4 LABL Marriage fact
3 _FA2
4 LABL Fact 2
3 _MSTAT

0 @F112@ FAM
1 HUSB @I352@
1 WIFE @I353@
1 CHIL @I134@
2 _FREL Natural
2 _MREL Natural

There are many tags we can get an idea what they are use for but some even I wonder what the prupose was or how to sync to them to properly include them

ttwetmore 2010-12-21T13:52:21-08:00

The tags starting with underscore characters are custom additions. _MREL and _FREL are used to indicate the type of relationship a child has to his/her mother/father (Natural, Adopted). The _FA's are custom facts. You'd have to get info from the companies that use them to fully understand them. The _SCHEMA in the header record is supposed to warn the importing program what kinds of extra tags it can expect to find within the file.

If you can't find that someone else has already researched these tags and listed what they all mean, I'm afraid it's up to you or anyone else interested to make the contacts necessary to figure them all out. But I'll bet you can get a good indication of what they all mean with a little googling.

Tom Wetmore

mstransky 2010-12-21T14:00:27-08:00

"But I'll bet you can get a good indication of what they all mean"-tom

Yes, I can, it is just sad that when a app strives to be the software supremeO they talk about the structuresand models but leave the users and devs having to resort to riddels and trying to peice meal information together.

This is the whole point guess there never was any push to be open about trading data. If devs could not peice the riddles 100% they would cutthe data out and move on. or made their own tags as exports and had the opinion "deal with it yourself".

That i am trying to avoid.

louiskessler 2010-12-21T19:02:43-08:00

Mike, Tom:

First, GEDCOM 5.5 (I would use 5.5.1) defines all the tags that are valid. Mike, if you want a list, the GEDCOM spec itself is the best place to go.

The Schema system was deprecated prior to 5.5. Lots of old GEDCOMs, especially FTM, use it, but they use it very trivially. It was an attempt to extend the language, but was deemed too complex for most programmers to want to implement.

But I don't know why you are going through so much trouble. Tags are not the most important part of BetterGEDCOM. In fact, I'd like to minimize them, and allow custom tags. Why if we could get away with essentials like birth, death and marriage and use the generic "EVENT" with a "TYPE" descriptor for everything else, that might work fine.

Custom tags are a GOOD thing about GEDCOM in my eyes - not a bad thing. They allow extension in a trivial way.

I don't really see any purpose in enumerating all the tags used by all the programs.

I think I am qualified to say that, because my program attempts to read in all the flavors of GEDCOM and output a meaningful report. I refer to that as Extended-GEDCOM, because many programs do not follow GEDCOM to the T.

When I encounter a GEDCOM with a new tag, if it doesn't pass through and display correctly like a generic event, then I add some custom code to display it adequately. But that is the exception rather than the rule. In a way, I do almost the same as Tom's Lifelines program does: I read in the GEDCOM directly, make trivial internal changes to convert it to my Extended-GEDCOM data structure, and store it that way.

If I was going to read a BetterGEDCOM file, I would read it in directly, and again convert it to my Extended-GEDCOM data structure.

The key thing here is and my opinion always has been that GEDCOM isn't all that bad. We will always need records for INDI and SOUR and REPO and NOTE and OBJE, we are in debate on whether or not to use FAM, and we probably want to add EVENT and PLAC and CITE and GROUP and some structures for storing the evidence and conclusion models.

Those Records (as they're called in GEDCOM) or Entities (as we seem to be calling them for BetterGEDCOM) are what are important to flesh out. That's what I feel BG needs to work out first, along with the structure/definition of those entities.

Tags that are needed will only become clear once that work is done first.

ttwetmore 2010-12-21T21:38:46-08:00

I agree wholeheartedly with most of Louis's points. Except that I am more of a fan of hard tags than he. Though I agree fully about the value of escapes to custom tags.

I especially agree with the point that GEDCOM is not all that bad, and that the addition of EVENT, PLACE, GROUP records, with ability to handle evidence and conclusions is EXACTLY the way for Better GEDCOM to go. If anyone would care to read the DeadEnds model document one more time one will find that the DeadEnds model is essentially exactly this extended GEDCOM idea. Let me draw some parallels ...

DeadEnds Person Record == GEDCOM INDI record.
DeadEnds Vital Structures == GEDCOM event structures (e.g., BIRT, DEAT) within INDI records.
DeadEnds Relation Structures == GEDCOM associations (ASSO tag) allowing the establishment of relationships between persons without requiring events structures or event records.
DeadEnds Attributes == All the tags and value sets needed in all contexts to provide the PACTs in all record types.
DeadEnds Person References in Person Records == part of GEDCOM extension to handle evidence & conclusions.
DeadEnds Event Record == GEDCOM EVENT record extension required to properly handle events.
DeadEnds Role References in Person and Event Records == inter record linkages needed to connect INDI records with the new EVENT records.
DeadEnds Event References in Event Records == part of GEDCOM extension to handle evidence & conclusions.
DeadEnds Group Record == GEDCOM GROUP record extension, including the GEDCOM FAM record.
DeadEnds Place Records == GEDCOM PLACE record extension to avoid duplication in place information and other good things.
DeadEnds Note Records == GEDCOM NOTE records.
DeadEnds Source Records == GEDCOM SOUR records.
DeadEnds URL Records == GEDCOM OBJE records.
DeadEnds General Entity Records == no proposed GEDCOM extension for.

I am not ready to give up the Family record yet though I understand how it can logically be thought of as a sub type of the Group record.

I am also not convinced of the need for a Citation record. For me a Citation is nothing more than a string that can be generated automatically by looking at the chain of sources for the record in question, created by filling in citation templates that can be extracted from various style manuals.

The fact that I used XML as the language for expressing the DeadEnds model is IMMATERIAL to this point. DeadEnds records can be expressed in GEDCOM syntax using exactly GEDCOM tags wherever possible. The decision on the use of GEDCOM versus XML as the format for the transport file format is wholly independent of any aspect of the structure of the data model behind Better GEDCOM. It is a decision orthogonal to all others having to do with the model itself.

In my humble opinion my DeadEnds model has nearly completely worked out all issues for extending GEDCOM as needed to properly cover events, places, evidence & conclusions, and the necessary ways to express relationships between people. I suggest that it be used as one of the starting points for considering ideas for extending GEDCOM.

Tom Wetmore

GeneJ 2010-12-21T22:35:58-08:00

Tom:

Are you able to provide an array of reasonably complex examples to explain why you believe citations are "nothing more than a string that can be generated automatically by looking at the chain of sources for the record in question, created by filling in citation templates that can be extracted from various style manuals."

The _Evidence Explained..._ series by Elizabeth Shown Mills has got to be high on the list of authorities work we advance as part of BetterGEDCOM.

There are template-like examples in the _Evidence Explained_ series, however, those are examples of *principles* and *practices.* The templates are not intended to cover all combinations and permutations of those underlying practices.

Tom Jones work with inferential genealogy is surely another authority "high on the list."

What oldGEDCOM called the citation houses much of the heart and soul of my reasoning. I don't see how we can, say we are advancing the evidence-conclusion process if we diminish the record of our reasoning.

Thank you. --GJ

ttwetmore 2010-12-22T04:36:34-08:00

GeneJ,

I will put together a simple example. Here are some off the cuff definitions that provides the background.

Source == Something in the real world that contains evidence of genealogical significance. The classic example is a book.

Source Record == A record in a database that describes a Source. For a book this would be a record with the book's author, title, and publication information.

Source Reference == A reference inside a genealogical record in a database (e.g., a Person or Event Record) that points to/refers to the Source Record that describes the Source that holds the evidence that justifies that particular genealogical record. The Source Reference may further discriminate the Source, the best example of this being is providing the page number in a book Source where the specific evidence was found. That is, it is better not to put page numbers inside Source Records, but to put them in the particular references that refer to the Source Record.

Citation == A string to be used in a footnote or as a bibliography entry that describes a Source in an agreed upon conventional format with different parts in specified locations in the string, with specified rules about fonts and quotation marks. This string is automatically generateable by finding the necessary information in the Source Reference (e.g., page number) and in the Source Record referred to (e.g., title, name, publication information.) That is, everything you need to create a citation string is already in the database in Source Records and Source References. There is no need for an additional record type.

Tom Wetmore

mstransky 2010-12-22T05:35:04-08:00

Tom, thank you!
just with that I can understand your structure much better. I can look back and not get confused. Your "Source" is like my "source" and your "source record" is like my "eid". Noe when others talk and post I see I will not mistake records in source or as in yours "source record" is a sub of "source" not a source it self.
Just simple outlines like this if all did this, we could all get specific about pin point topics.

I will start a new thread because Louis, Tom, others and myself kind of have grips about FAM. I am wondering the root - like, dislike. "this will be a new topic"

Stuff like this when/if also others display transparency create clarity to the conversations so when a person talk about "x" others will not assume "z" is include in the discussion. I was at fault a few times of that.

GeneJ 2010-12-22T07:03:59-08:00

@Tom:

Thank you.
Is there a different page on the wiki where we could start a discussion thread and continue this dialog?

The reason I suggested we look some reasonably complex examples is to move the discussion beyond identifying isolated information that might come from a published book or modern day certificates/vital records.

Would also like to bring up Mark Tucker's work about ideas for online sources. He even had a blog entry called Marc XML. http://www.thinkgenealogy.com/2009/06/20/better-online-citations-marc-xml/

mstransky 2010-12-22T08:17:05-08:00

Just an idea for a catchall in the future if "WE" cannot find a sync of properly importing or exporting ALL tags + data. One idea we could have like GEDCOM NOTE Here is a work around.

We know that import export error log can track data that is dropped flagged and such.

Picture an export, AFTER the fact of a BG standard. an APP comes up with a custom TAG called "_WXYZ" what it will do is export it and any other "_TUVW" to a "_CATCHALL".
That catchall place will hold data "gedcomISH" like so

0 INDI
1 NAME
2
1
2
3
1 _CTAL
2 CONT Hair Color Brown
2 CONC shoe size 9
2 CONC fav food pizza

or in xml like
<set>
<INDI>I5436</INDI>
<NAME>John P./Smith</NAME>
<CTAL>Hair Color Brownshoe size 9fav food pizza</CTAL>
</set>

this way if there is unsync data the DATA can still be exported and handled. Even imported as well.

The user can open a dashboard and view the data like a note. If that new app handles such data fileds the user can create PROPER data entry and go back to that note catch all and delete such error transffered data.

This approach can be a safty net for data that comes after the fact or during such an import NOT to drop data that may be very vital information.

mstransky 2010-12-22T08:18:13-08:00

That did not post right???
<CTAL>Hair Color Brown : shoe size 9 : fav food pizza</CTAL>

my "/" for : made the text font change?

mstransky 2010-12-21T08:41:39-08:00

http://bettergedcom.wikispaces.com/Sync+tag+table

If not just delete it or edit and rename it to a better understanding if there is one.

GeneJ 2010-12-21T08:56:37-08:00

I can't speak for others, but it's nice to be able to see this kind of information in a table form.

mstransky 2010-12-21T10:50:03-08:00

Thanks, Over the years I have never seen a full list of attributes or tag labels for data fields or thier purpose to the reason of the data.

Some posting on this topic might have started here http://bettergedcom.wikispaces.com/message/view/20+Dec+2010+Organizers+Meeting+Notes/31994709

mstransky 2010-12-21T11:08:52-08:00

Tom, It is good that you can control your data striaght from your gedcom file. I read everthing you wrote and keep that in my mind your apps ability. One question that huants me. The gedcom use to chop off data. Does you app overcome this problem and permit unlimited text?
If so if you export your file as a pure gedcom like file. would other programs chop off your long than permitted strings?

ttwetmore 2010-12-21T12:59:21-08:00

Mike says: "One question that huants me. The gedcom use to chop off data. Does you app overcome this problem and permit unlimited text?
If so if you export your file as a pure gedcom like file. would other programs chop off your long than permitted strings?"

My software doesn't limit the length of GEDCOM lines so does not truncate long lines on import. However most users probably keep the values relatively short and use CONT lines to continue long values. I can't say what other programs would do on importing lines with very long values.

Tom Wetmore

ttwetmore 2010-12-21T13:01:31-08:00

Mike says, "Over the years I have never seen a full list of attributes or tag labels for data fields or thier purpose to the reason of the data."

I use the 5.5 specifications whenever I have a question about interpreting tags, and can generally figure out what I need to know from there. I have also seen a number of web sites that list the valid tags and give them short descriptions.

Tom Wetmore

mstransky 2010-12-22T05:50:50-08:00

FAM issues

There are those who like or dislike, what is the root problem.

My observation is this.
1) Gedcom does display the nuclear family and not extended siblings and step children.
2) Or that relations are not captured as "step child of..."

2# As far as I see is many place a "association" to other person with a data field "Step child of" and point to the desired person. This segment of data is placed with in the INDI record sets.
Is this the problem.
2.a Having extra data that can already be calculated by the app therefore the is no need to have extra data stored as redundancy?
2.b That this data is store in the INDI set and should actually be stored within a "Source record" set as part of a notation to a citation information?

Please explain, before I babble out assumptions and give my work around.
1.) I store these "step child of..." as a role a person is from a "source" as "source record" which is not stored in the INDI data sets. Also I do not use extra tags to show navigational relations in the navigation area since any app can calculate relations with no extra need to handle extra data fields of information.

Why is the overall outcome of FAM an issue? data storage, poor linkage of people, of clarity of an evidence record keeping, or other?

AdrianB38 2010-12-23T10:04:13-08:00

Mike - a couple of thoughts from me on this.

1. If the step-child issue is handled as an association in GEDCOM, then it's different from the handling of a biological child. One is in the FAM and the other not. _IF_ I want to go beyond the biological family, then an inconsistent method of entering (a) offends me anyway, (b) requires I remember more than 1 way of entering, (c) requires me to do something different to print them out.

Sub-issue - some people get very uptight about the FAM containing anything other than biological children. If a more neutral concept were used (e.g. a GROUP entity type, with a sub-type of "Family") which had roles for the relationships between PERSON entity type and GROUP for biological, step, adopted, etc., this might be more acceptable and consistent.

2. Just relating a PERSON entity to a GROUP entity (sub-type "Family") or FAMILY entity with a role of "step-child" omits data - which parent is the step-parent and which the biological? I believe this can be done by referring back to the "Birth" event and finding who the biological parents are (assumption - the bio parents also participate in the event with suitable roles). HOWEVER, it is arguable that there is duplication here with the family data.
NB - I believe that I need to explicitly designate some children as step-children and some not. There is a difference between children who never live with their step-parent, because they're old enough, and those who do live with them. Currently I do this by adding them into the family in GEDCOM - there isn't any other event for this.

3. Therefore some argue that the Family (sub-)entity is necessary as it can be derived from the birth (or adoption) events. I do not agree as
(i) we can justify a GROUP entity type for all sorts of reasons (business partnerships, regiments, etc)
(ii) so we might as well have a family sub-type of the GROUP.
(iii) I've got loads of data stored against families in GEDCOM that are genuinely applicable to the family as a whole - e.g. notes on their residences, etc. SO I want somewhere to put them.
(iv) if we remove "Family" we will baffle normal people!!

In summary, it's about inconsistent and therefore difficult and potentially wrong inputs for the different types of children.

AdrianB38 2010-12-23T10:06:09-08:00

D'oh!!!

3. Therefore some argue that the Family (sub-)entity is necessary as it can be derived from the birth (or adoption) events

should of course read ...
3. Therefore some argue that the Family (sub-)entity is NOT necessary as it can be derived from the birth (or adoption) events

mstransky 2010-12-23T12:33:47-08:00

Ok I follow you,
1.---------

One is in the FAM and the other not.
FAM stored as Biological Children
0 FAM @F4545@
1 CHILD @I1234@

and the other I think they do it like so
Could be Biological or not like adoptions.
0 @I1234@ INDI
1 ASSO "that relatinship"

2.--------
Sure they give you a place to enter associations but may or may not overright or duplicate one relations in one screen and only display information in another screen which can get confusing to the viewer.

Tom just posted a PDF called event based relations or somthing to that effect. That what I have always done with my data over the years becuase of what you are pointing out.

For your (i) and (iii) above this is how I enter such gruop data.

You have a business document, that lists the two partners. That I call a source.
View source record#45 (THAT IS AN EVENT IN TIME)
You will have entered the two seprate people as EIDs sets. this record captures of evidence it that single source, the dates, people rolls or relations. address and any info you find with in.
IN that same record set will POINT to actual people in the outline and Point at the source you pulled it from.

E@12 | S@67 | P@123 | Partner #1
E@13 | S@67 | P@165 | Partner #2
E@14 | S@88 | P@55 | Step Father
E@15 | S@88 | P@76 | Step Daughter

If one wanted a group event like a business venture one selects S@67 which includes both the people from a business transaction and there rolls.
If you entered (JUST DONT EVEN THINK ABOUT THE NAVIGATION TREE THAT IS SEPERATE) 5 people from a census which could have all the sibilings and nephew live at the ADDRESS.
E@16 | S@99 | P@109 | Head or father
E@17 | S@99 | P@110 | Wife or mother
E@18 | S@99 | P@114 | Child
E@19 | S@99 | P@165 | Nephew
E@20 | S@99 | P@212 | Adopted
E@21 | S@99 | L@56 | 137 Meeker Ave NY

So beside the normal family view which does biological views and lawful relations. That outline has NO RECORDS IN IT, the outline only acts like a place marker. BUT becuase my storage in event based the outline tree person acts like a filter via the ID to list all records they are tied too.

Also and event can pull up ALL persons or places tied to anevent as a group WITH or without any biological relations from a FAM record, its based on SOURCES and include each observable person and location in that SOURCE.
Yet I still have a outline display!!!!

"(iv) if we remove "Family" we will baffle normal people!!" - Adrian
Answer: This is why I keep it but play down its importance to be anything more then a navigational place-marker to so only generic biological relations and generic start - end dates and places per person or place/location tables the sames.

All I have done is stopped storing data inside the INDI records.

I hope maybe this might be an options to consider? or others might get an idea from it.

mstransky 2010-12-23T13:25:17-08:00

Another big reason for doing this is about sharing. People really dont share people and the records, that is the trival side effect of sharing documents.

What do I mean? say you what to share two people that are found in a census and a birth cert. your family member does not have this.

1. GEDCOM You send him the two INDI records with a cluster of internal tags that may or maynot mash his collection of internal tags.

However by sharing the two source documents you send him SOUR#1 that has 5 people discribed inside that documnet and a birth cert that descibes 3 people the baby and the parents.
HE/SHE excepts this EVENT based records of collected sources. 2 sources and 8 event role people.
2. He can see there are others in the documnet which he can quickly link(point) described role playing people in the source, to the people in his tree, or create the new person placeholder in the tree outline.

Old GEDCOM you never assume there is a person to add to a tree, added them then go find proof. YOU find the proof then gee a person exsisted back then THEN add them to a tree outline.
My opionin GEDCOM relied on the proof of a navigational pedigrees and went to find supporting proof of it. I like the proof of records, then you make a placeholder person to point all the proof to for quick views.

Now that I hear myself talking, I will stop. It might be good for some other input. I can bring this point of "DB-structured" "Event based genealogy vs. Individual based genealogy" to another thread.

But I think we need to see more input from others about FAM issues that are the same or ones either of us did not touch.

Custom App Your TAG	BG equivalent tag when created	GEDCOM tags	DATA type string, number, text...	definition of data and use.
		INDI	Number	INDIVIDUAL: A persons unique number in a database.
		FAM	Number	FAMILY: A family unique number in a database
		"	"	"
		"	"	"

Comments